feat: wrap provider.chat in llm.call span with timing and tokens by l50 · Pull Request #262 · dreadnode/ares

l50 · 2026-05-07T18:20:43Z

Key Changes:

Wrapped each provider.chat() call inside call_with_retry in its own llm.call info span so timing and token usage are attributed to the attempt that produced them
Captured per-attempt input, output, and cache token counts, duration, stop reason, and error message as span fields
Recorded task.id, llm.model, llm.attempt, llm.tool_count, and llm.message_count at span creation for filterable Tempo queries

Added:

Per-attempt llm.call info_span! in ares-llm/src/agent_loop/retry.rs with Empty placeholders for fields that are only known after the call returns (llm.input_tokens, llm.output_tokens, llm.cache_read_tokens, llm.cache_creation_tokens, llm.duration_ms, llm.stop_reason, llm.error)
Wall-clock duration measurement via std::time::Instant recorded into llm.duration_ms so retry waits are not folded into the successful call's latency
tracing::Instrument instrumentation of the provider.chat() future so async work runs inside the span context

Changed:

ares-llm/src/agent_loop/retry.rs use line now imports std::time::Instant plus tracing::{field::Empty, info_span, Instrument} alongside the existing warn
Result handling in call_with_retry was split: the call result is first inspected to record token usage / stop reason / error on the span, then the existing retry decision logic runs on that same result

…tokens Without per-call attribution there was no way to tell whether a slow agent loop was burning time inside provider.chat (network/LLM) or between calls (tool dispatch, queue waits). Token spend was visible only on the session log, not in Tempo traces. Open one `llm.call` span per retry attempt around `provider.chat`. After the call returns, record duration_ms plus the four token-usage counters and stop_reason as span attributes; on error, record the formatted error. Each retry gets its own span so a 429 backoff does not inflate the duration attributed to the eventual successful call. The span also carries `task.id`, `llm.model`, `llm.attempt`, and the request shape (tool/message counts) so Tempo queries can isolate slow calls without joining other spans. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

codecov · 2026-05-07T18:25:03Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 75.11%. Comparing base (60b2915) to head (5b67e1b).
⚠️ Report is 9 commits behind head on main.

Additional details and impacted files

@@           Coverage Diff           @@
##             main     #262   +/-   ##
=======================================
  Coverage   75.10%   75.11%           
=======================================
  Files         383      383           
  Lines       81465    81492   +27     
=======================================
+ Hits        61187    61214   +27     
  Misses      20278    20278

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

**Key Changes:** - Wrapped each `provider.chat()` call inside `call_with_retry` in its own `llm.call` info span so timing and token usage are attributed to the attempt that produced them - Captured per-attempt input, output, and cache token counts, duration, stop reason, and error message as span fields - Recorded `task.id`, `llm.model`, `llm.attempt`, `llm.tool_count`, and `llm.message_count` at span creation for filterable Tempo queries **Added:** - Per-attempt `llm.call` `info_span!` in `ares-llm/src/agent_loop/retry.rs` with `Empty` placeholders for fields that are only known after the call returns (`llm.input_tokens`, `llm.output_tokens`, `llm.cache_read_tokens`, `llm.cache_creation_tokens`, `llm.duration_ms`, `llm.stop_reason`, `llm.error`) - Wall-clock duration measurement via `std::time::Instant` recorded into `llm.duration_ms` so retry waits are not folded into the successful call's latency - `tracing::Instrument` instrumentation of the `provider.chat()` future so async work runs inside the span context **Changed:** - `ares-llm/src/agent_loop/retry.rs` `use` line now imports `std::time::Instant` plus `tracing::{field::Empty, info_span, Instrument}` alongside the existing `warn` - Result handling in `call_with_retry` was split: the call result is first inspected to record token usage / stop reason / error on the span, then the existing retry decision logic runs on that same result

l50 changed the title ~~chore: update GitHub Actions dependencies and improve LLM call tracing~~ feat: wrap provider.chat in llm.call span with timing and tokens May 7, 2026

Merge branch 'main' into feat/telemetry-llm-call-span

5b67e1b

l50 merged commit 205ae6f into main May 8, 2026
11 checks passed

l50 deleted the feat/telemetry-llm-call-span branch May 8, 2026 01:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: wrap provider.chat in llm.call span with timing and tokens#262

feat: wrap provider.chat in llm.call span with timing and tokens#262
l50 merged 2 commits into
mainfrom
feat/telemetry-llm-call-span

l50 commented May 7, 2026 •

edited

Loading

Uh oh!

codecov Bot commented May 7, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

l50 commented May 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov Bot commented May 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

l50 commented May 7, 2026 •

edited

Loading

codecov Bot commented May 7, 2026 •

edited

Loading